set.seed(28)
n = 150 # fixed total sample size
GR = 2 # n2/n1
n1 = round(n / (GR + 1))
n2 = round(n1 * GR)
sd1 = 1
SDR = 2 # sd2/sd1
sd2 = sd1 * SDRData Simulation with Monte Carlo Methods
Marko Bachl
University of Hohenheim
Introduction & overview
Monte Carlo Simulation?
Proof by simulation — the Central Limit Theorem (CLT)
Errors and power — torturing the t-test
Misclassification and bias — Messages mismeasured
Outlook: What’s next?
Comparison of Student’s and Welch’s t-tests
A priori power calculation for Welch’s t-test
Student’s t-test for two independent means
\(t = \frac{\bar{X}_1 - \bar{X}_2}{s_p \sqrt\frac{2}{n}}\), where \(s_p = \sqrt{\frac{s_{X_1}^2+s_{X_2}^2}{2}}\) and \(df = n_1 + n_2 − 2\)
Welch’s t-test for two independent means
\(t = \frac{\bar{X}_1 - \bar{X}_2}{s_{\bar\Delta}}\), where \({\bar\Delta} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\) and \(df = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{\left(s_1^2/n_1\right)^2}{n_1-1} + \frac{\left(s_2^2/n_2\right)^2}{n_2-1}}\)
Old school advice: Student’s t-test for equal variances, Welch’s t-test for unequal variances.
Modern advice: Always use Welch’s t-test.
Better if assumptions are violated; not worse if assumptions hold.
e.g., Delacre, M., Lakens, D., & Leys, C. (2017). Why Psychologists Should by Default Use Welch’s t-test Instead of Student’s t-test. International Review of Social Psychology, 30(1). https://doi.org/10.5334/irsp.82
For those who don’t care about t-tests: Idea also applies to heteroskedasticity-consistent standard errors.
Question: What is the goal of the simulation?
Quantities of interest: What is measured in the simulation?
Evaluation strategy: How are the quantities assessed?
Conditions: Which characteristics of the data generating model will be varied?
Data generating model: How are the data simulated?
How does the t.test() work in R?
Which parameters and functions do we need for generating the data?
How do we implement the experimental design for the simulation study?
How do we run the simulation?
How do we assess and visualize the results?
Two factors:
Group sizes: Equal or unequal?
Group standard deviations: Equal or unequal?
Implementation with ratios and fixed total sample size.
g1 = rnorm(n = n1, mean = 0, sd = sd1)
g2 = rnorm(n = n2, mean = 0, sd = sd2)
t.test(g1, g2) # Welch (default)
t.test(g1, g2, var.equal = TRUE) # Student
Welch Two Sample t-test
data: g1 and g2
t = 0.74022, df = 147.94, p-value = 0.4603
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.3198803 0.7030466
sample estimates:
mean of x mean of y
-0.09838857 -0.28997168
Two Sample t-test
data: g1 and g2
t = 0.60009, df = 148, p-value = 0.5494
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.4393035 0.8224698
sample estimates:
mean of x mean of y
-0.09838857 -0.28997168
t.test()List of 10
$ statistic : Named num 0.74
..- attr(*, "names")= chr "t"
$ parameter : Named num 148
..- attr(*, "names")= chr "df"
$ p.value : num 0.46
$ conf.int : num [1:2] -0.32 0.703
..- attr(*, "conf.level")= num 0.95
$ estimate : Named num [1:2] -0.0984 -0.29
..- attr(*, "names")= chr [1:2] "mean of x" "mean of y"
$ null.value : Named num 0
..- attr(*, "names")= chr "difference in means"
$ stderr : num 0.259
$ alternative: chr "two.sided"
$ method : chr "Welch Two Sample t-test"
$ data.name : chr "g1 and g2"
- attr(*, "class")= chr "htest"
sim_ttest = function(GR = 1, SDR = 1, n = 150) {
n1 = round(n / (GR + 1))
n2 = round(n1 * GR)
sd1 = 1
sd2 = sd1 * SDR
g1 = rnorm(n = n1, mean = 0, sd = sd1)
g2 = rnorm(n = n2, mean = 0, sd = sd2)
welch = t.test(g1, g2)$p.value
student = t.test(g1, g2, var.equal = TRUE)$p.value
res = list("Welch" = welch, "Student" = student)
return(res)
}expand_grid() is the tidy implementation of base::expand.grid().GR and SDR| condition | GR | SDR | implies |
|---|---|---|---|
| 1 | 1 | 0.5 | n1 = 75, n2 = 75, sd1 = 1, sd2 = 0.5 |
| 2 | 1 | 1.0 | n1 = 75, n2 = 75, sd1 = 1, sd2 = 1 |
| 3 | 1 | 2.0 | n1 = 75, n2 = 75, sd1 = 1, sd2 = 2 |
| 4 | 2 | 0.5 | n1 = 50, n2 = 100, sd1 = 1, sd2 = 0.5 |
| 5 | 2 | 1.0 | n1 = 50, n2 = 100, sd1 = 1, sd2 = 1 |
| 6 | 2 | 2.0 | n1 = 50, n2 = 100, sd1 = 1, sd2 = 2 |
set.seed(689)
i = 10000 # number of sim runs per condition
tic() # simple way to measure wall time
sims = map_dfr(1:i, ~ conditions) %>% # each condition i times
rowid_to_column(var = "sim") %>% # within simulation comparison
rowwise() %>%
mutate(p.value = list(sim_ttest(GR = GR, SDR = SDR))) %>%
unnest_longer(p.value, indices_to = "method")
toc() # simple way to measure wall time14.832 sec elapsed
# A tibble: 120,000 × 6
sim condition GR SDR p.value method
<int> <int> <dbl> <dbl> <dbl> <chr>
1 1 1 1 0.5 0.345 Welch
2 1 1 1 0.5 0.344 Student
3 2 2 1 1 0.534 Welch
4 2 2 1 1 0.534 Student
5 3 3 1 2 0.647 Welch
6 3 3 1 2 0.647 Student
7 4 4 2 0.5 0.00985 Welch
8 4 4 2 0.5 0.00224 Student
9 5 5 2 1 0.602 Welch
10 5 5 2 1 0.614 Student
# … with 119,990 more rows
# ℹ Use `print(n = ...)` to see more rows
| GR | SDR | method | P_p001 | P_p01 | P_p05 |
|---|---|---|---|---|---|
| 1 | 0.5 | Student | 0.001 | 0.010 | 0.049 |
| 1 | 0.5 | Welch | 0.001 | 0.010 | 0.048 |
| 1 | 1.0 | Student | 0.000 | 0.010 | 0.048 |
| 1 | 1.0 | Welch | 0.000 | 0.010 | 0.048 |
| 1 | 2.0 | Student | 0.001 | 0.011 | 0.047 |
| 1 | 2.0 | Welch | 0.001 | 0.010 | 0.047 |
| 2 | 0.5 | Student | 0.008 | 0.034 | 0.107 |
| 2 | 0.5 | Welch | 0.002 | 0.010 | 0.049 |
| 2 | 1.0 | Student | 0.001 | 0.010 | 0.051 |
| 2 | 1.0 | Welch | 0.000 | 0.010 | 0.052 |
| 2 | 2.0 | Student | 0.000 | 0.002 | 0.016 |
| 2 | 2.0 | Welch | 0.001 | 0.009 | 0.052 |
Investigate how the amount of disparity in group sizes and standard deviations changes the results.
When do the differences between the methods become neglegible?
When does the false discovery rate of Student’s t-test become really bad?
Adapt the experimental setup accordingly.
Change or expand the levels of the GR and SDR factors.
Consider changing the experimental design to make the simulation more efficient.